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Address: 313 Brookline Street 

Needham, MA 02492 
Country of Citizenship: USA 

Name: Guo-Zhong Li 
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4. What is the technical field of the invention? 

Chromatography. Mass spectrometry. On-line LC/MS separations. 

Describe your present understanding of the invention. 

Introduction 

A key problem in analytical chemistry is the estimation of the concentration of one or more 
molecular entities contained within in a complex mixture. 

Liquid chromatography (LC) followed by mass spectrometry (MS) is a well-known technique 
(LC/MS) that can separate large numbers of chemical entities in a sample, thereby facilitating the 
concentration measurement, or quantitation, of each. By measuring the exact mass of an entity, 
the MS can track the entity between samples. By measuring the response or intensity of the 
tracked entity, then the concentration of an entity can be tracked from sample to sample. 

Here we consider, but do not limit, the discussion to samples that are separated by LC, ionized 
with electrospray ionization, and analyzed by mass spectrometers such as quadrupoles, time of 
flight, ion traps, or combinations of these analyzes. This discussion also pertains to entities that 
may be fragmented by MS-MS, or MS n techniques. 

In electrospray ionization, small molecules can produce a single characteristic ion. A larger 
molecule, such as a peptide or a protein, can produce a set of ions. Algorithms, well-known in the 
art, can reduce such sets of ions to a single effective ion. Thus we use the term entity to mean a 
single molecule whose concentration can be determined by examination by one ion or by more 
than one ions; in either case we assume that an effective mass and retention time can be 
assigned to each entity. 

In LC/MS, a sample is injected into the system for analysis, so for each injection, the LC/MS 
system measures the retention time, molecular weight, and intensity of each entity. 

Comparison of intensities of corresponding entities between injections is the basis of, for 
example, determining of the concentration of an entity changes significantly between control and 
unknown samples. Changes in expression level of a protein between samples will manifest itself 
by changes in the protein's concentration between samples. 

A set of samples may be processed via sequential injections. The same sample may be injected 
multiple times to provide a set of replicate injections. As an example, one may inject each of two 
distinct samples (a standard and unknown) three times, to produce a total of six injections. From 
this data, the reproducibility of the concentration measurements can be inferred for each entity, 
as well as the change in concentration of each entity between the control sample and the 
unknown sample. Each sample may contain an amount of an internal standard to provide a 
relative calibration between sampled. 

For a technique to determine the concentration of any entity, it must first adequately resolve that 
entity from all others. The LC/MS technique allows us to separate entities (or the ions associated 
with an entity) in both mass and retention time. Entities that co-elute in retention time, which 
would otherwise be indistinguishable) can be resolved in mass, thus allowing for an accurate 
estimate of their intensity. 

The problem 

But to associate, or to track, an entity from one injection to another, accurate mass alone may or 
may not be sufficient. It is this issue that this disclosure will address. 

To see the issue, consider the properties of mass and retention time of a molecule. 

The molecular weight is an intrinsic property of a molecule. A mass spectrometer measures the 
ratio of molecular weight to charge, m/z. We use the symbol ju to indicate the mass-to-charge 

ratio, thus ju = m/z . Values for /j can be compared directly between injections. Any variations in 
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measured values of // between injections for the same entity must be due only to instrumental 
noise sources. 

For samples such as peptides or proteins, electrospray ionization may allow us to determine the 
charge state, z, which then allows us to infer the molecular weight m of the entity. So we may 
track entities by their molecular weight. For the purposes of this disclosure we can use the 
empirically observed value for // or the inferred value m, interchangeably. 

With sufficiently high mass accuracy, each entity is potentially uniquely distinguishable based 
upon its value for//. Thus with a sample containing few entities, a high accuracy mass 

spectrometer, such as a time-of-flight (TOF) analyzer with resolution of m/&m& 20,000 and 

sufficient chromatographic resolution to simply separate the entities, each entity can be tracked 
from one injection to another based upon accurate measurements of// alone. 

[Note that we are using a molecules value for m or // to track it between injections. We are not 

necessarily using m or ju to identify the entity in the sense of determining its chemical 

composition or structure. We are using the molecular weight as an empirical and possible unique, 
identifier of the chemical.] 

It may be that mass alone is not sufficient to track an entity from one injection to another. If mass 
accuracy was low and the sample is complex, then it may be that the mass of an entity seen in 
one injection may match the empirically observed mass of an unrelated entity in another injection. 
For example, we may have two entities where // is 1024.200 amu and 1024.300. These entities 

are distinguishable with MS resolutions less than 0.100 amu, but they will not be distinguishable 
with resolution greater than 0.100 amu. 

The chromatographic retention time of an entity can be an additional, potentially independent 
identifier of that molecule entity. Now, a molecule's retention time is not an intrinsic property. Its 
value depends on the interactions of the molecule with the liquid and solid phases in the 
chromatographic separation, among other effects. But even though the retention time is not 
intrinsic, its value can be made highly reproducible for a given separation method. Ideally, if the 
retention time were exactly reproducible and to high accuracy, then the combination of agreement 
in both mass and retention time could well be sufficient to allow each entity to be uniquely tracked 
from one injection to another. That is, the likelihood that two different entities shared the exact 
same retention time and mass would be highly unlikely. 

However, retention time is not exactly reproducible between injections. Retention time of a 
molecule can wander from injection to injection. 

The prior art 

But there can be regularities in retention time that we can take advantage of. An object of this 
disclosure is to show how certain regularities in retention time, combined with a high-mass 
accuracy MS can allow us to reduce or eliminate the ambiguity that may occur with comparisons 
of // alone. 

There are four regularities in retention time we consider. The first is well known in the prior art. 
Three are not known. 

The first regularity is that if an entity elutes in injection A at time t, then that entity will elute in 
another injection, B within a window of width /±A/. Though retention time is not exact, its 
wander is bounded from one injection to another. This bound can be determined empirically. We 
call this the coarse retention time window, At c 

Given such a bound, we can then ask: can two entities that elute within the bound have the same 
(i.e, measured to be the same) value for ju . If all entities that lie within A/ f have different values 
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for ju , then we can track entities based upon similarities of retention time and matching value for 

M- 

In this disclosure, we consider samples that are complex enough that within the retention time 
window At c there can still be significant number of entities whose values for // do not render 
them unique. 

Thus we consider a situation where most, but not all entities in a mixture have unique masses. 
The aim of this invention is to take advantage of the entities that do have unique masses, and to 
take advantage of these unrecognized regularity in chromatographic retention time, in order to 
uniquely track those remaining entities that might otherwise not be distinguished by mass alone. 

Examples of such sample are digests of peptides that derive from natural protein samples. 
Peptide digests of blood serum, for example, can contain 10,000 or more distinct peptides. In a 
chromatographic separation, there can be 30 or more peptides elute within the width of a a 
chromatographic peak. The question then becomes whether we can count on the values of 
molecular weight associated with these 30 peptides can be relied upon to be unique in all cases. 

The invention 

The idea behind this disclosure is to take advantage of three additional, but previously 
unrecognized regularity of the retention time behavior of entities. 

The second regularity occurs when two different chemical entities elute at the same identical 
retention time in all separation. That is, if they elute at the same time in one separation, then they 
elute at the same identical retention time in all other separations. The retention time of the pair 
may change from separation to separation, but the difference in retention time if zero in one 
separation will be zero between that pair for all separations. 

This regularity does occur in the important case of peptide mixtures. Two peptides that elute at 
the same retention time in one separation will elute at the same identical retention time in all other 
separations. The retention time may change from separation to separation, but the difference in 
retention time if zero in one separation will be zero between that pair for all separations. 

This disclosure then restricts itself to samples where we can count on this regularity: Two entities 
that elute at the same identical retention time in one separation will elute at the same identical 
retention time in all other separations. 

The third regularity will, at first, seem at odds with the second. There is intrinsic measurement 
errors associated with retention time. Thus two entities that elute at the same retention time in all 
injections will in fact elute at somewhat different measured elution times. The measured retention 
times will match only on average. We can think of these as statistical errors associated with 
locating the top of peaks. Thus if an entity elutes at 10.0 min in an injection, its measured 
retention time might vary, by say up to +/-0.2 minutes. Normally, this intrinsic variation in retention 
time is masked by the retention time wander between injections. This intrinsic variation is 
measured when we track two entities that both elute at 10.0 minutes in an injection. 

Generally this statistical error is much less than the wander error, described by A/ c . We call the 
threshold associated with the statistical measurement error, the fine retention time threshold At r 



A fourth and final regularity occurs for entities that elute closely in time, but not exactly the same 
retention time. The retention-time order at which those two entities elute may change from 
separation to separation. However, if there is a third entity that elutes between those two it will 
always elute between those two. 
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For example, as a result of retention time wander the time offset between two entities may 
change from injection to injection. For example if the entities elute at 2.0 and 2.4 minutes in one 
injection, they may elute at 2.5 and 2.7 minutes in a second injection. While it is true that the first 
entities retention time drifted by 0.5 minutes between the injections, it is not this quantity that we 
are interested in. Rather, it is the difference in retention times between entity one and two. That 
difference was 0.1 minutes in the first injection and 0.2 minutes in the second injection. The third 
regularity pertains to a third entity that elutes in injection 1 between these two times. Let's say 
that it elutes at 2.1 minutes. The third regularity is that if the third entity elutes between 1 and 2 in 
injection one, then it will also elute between 1 and 2 in other injections. Moreover, the offset is 
proportional. Thus in injection 2 the entity will elute at 2.55 minutes. 

Regularity 1 and 3 must exist in all chromatographic separations; they are the characteristics of a 
reproducible measurement, or a robust method. Regularities 2 and 4 may or may not occur for all 
entities in a complex a mixture. But they do occur for peptide digests, and likely hold for mixtures 
where the entities have related chemical interactions with the chromatographic stationary and 
moving phases. The method described here can recognize when these regularities occur, and 
when they do, this method can take advantage of them for the purpose of tracking entities from 
injection to injection. 

The method to be disclosed takes advantage of these regularities to assign each entity a 
reference retention time. This reference retention time is unique in the sense that if two entities do 
not have the same reference retention time, they cannot be the same entity. If they do have the 
same reference retention time, they can be the same entity. 

Entities are then tracked by requiring that they have the same molecular weight and the same 
reference retention time. Entities that differ significantly in either or both molecular weight or 
retention time are not the same. 

To summarize: In complex separations, more than one entity may have the same molecular 
weight, to within the ability of the instrument to distinguish. This invention describes makes use of 
accurate mass measurement and hitherto unrecognized regularities in retention time to determine 
a retention time map. The map then allows the assignment of a reference retention time to each 
entity in a separation. The reference retention times of entities can then be compared between 
separations. 
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The Method 

The method disclosed here assigns a reference retention time to each entity in each injection. It is 
the reference retention time that can be directly compared and thus can be used to track entities 
between injections 

Consider two injections A and B. The method compares entities in A to those in B. From the 
results obtained from this comparison, the method assigns reference retention times to entities in 
injection B. Given a third injection, C, the method compares entities in A to those in C to obtain 
reference retention times for C. The reference retention times assigned to entities in B and C can 
then be directly compared to each other and/or to the retention times in A. The method, in effect, 
removes the effect of retention time drift between injection B and A, and between C and A for 
each entity in B and C. 

The method can then be extended to as many injections of as many samples as desired. 

In summary, the method first uses a subset of entities in A and B to obtain a retention-time map. 
It is from this map that a reference retention-time is obtained for all entities in B. 

Given injection A and B, the method for determining the retention-time map between A and B as 
follows: 

1) We use a subset of entities in injection A and B to first construct a retention time map. It is 
this map that will be used to obtain the reference retention times. 

2) We can choose subsets of entities in A and B based upon their intensity. For example, we 
could consider entities above a threshold-intensity. In the preferred method, a threshold is 
applied to entities in A, and that threshold is the median intensity of intensities in injection A. 
Similar, we consider for injection B all entities whose intensities lie above the median intensity 
in injection B. 

3) We could normalize the intensities from injection A or B either before or after applying a 
threshold-intensity. 

4) To construct the retention time map, we choose a coarse retention time threshold A^ . The 

preferred value is +/-5 minutes. The value can be refined upon examination of the mapping 
results. It is this value that is describes the maximum wander that occur in retention time. 
This value will be confirmed and possible refined in a later step. 

5) Choose a molecular weight threshold Aw . The threshold could also be expressed as parts 

per million (Aw//w)xl0 6 . We could also choose a m/z threshold Aju. The method 

disclosed here assumes that this threshold has been obtained through knowledge of the 
properties of the MS. This threshold can be obtained by methods known in the art. 

6) We then perform a search that compares all threshold-selected entities in injection A to those 
in B. This search finds those entities in A that have a single match to a threshold-selected 
entity in injection B. Two entities match if the difference in their mass falls below the mass 
threshold Am AND if the difference in their retention time falls within the coarse retention 
time threshold At c AND if there is only one entity in B that meets that criteria AND if the 

intensity of both entity lie above the respective median intensities. Such search methods are 
well-known in the art. 

7) This map will then contain only pairs of entities that have unique matches in molecular weight 
and in coarse retention time and match any intensity requirements 

Thus we have obtained a set of N pairs of entities, each indicated by a subscript/. Each pair 
satisfies the properties: 

\mf -mf\< Am 
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\t? -t?\<At 



If <median(l A ) 
if <median(l B ) 



It is within the scope of this method to add other restrictions. For example one could require that 
the intensity ratios satisfy fall within a threshold. Thus the result of search would produce entities 
that satisfy. 

1 i a J i 1 

-y<rand -^>- 

a preferred value for r might be 2. This additional intensity restriction is optional and is not 
required by the method. 

If one is comparing ions of known charge state, one could require that the charge states match. 
Thus the result of the search would produce entities such that Zf = Zf 

[Figure 1a, 1b] Figure 1a,b plots the results of this coarse search. The quantity A/. =tf -tf is 

plotted on the y-axis versus tf on the x-axis. Note that most of the points cluster along a dense 
backbone. There is scatter about the backbone and there are outliers. 

Given this list of matched pairs, the next step is to construct the retention time map. To do this, 
we sort the list of so that the values for tf are in ascending time order, thus > tf for 
/ = 1, . . . , N - 1 . These pairs result from the coarse search. 

Examination of this plot in Figure 1 confirms the validity of the choice of the value for At c . It also 
can suggest a refined value; e.g., for small amplitude excursion, the value for A/ c could be 
reduce. If it appears that the excursion exceed the initial value for At c , the value could be 
included, and steps 1-6 can be repeated for a larger value for A/ c . 

8) The next step in obtaining the map is to filter the backbone. That is, we wish to find a refined 
value for A/,. =tf - tf ,as a function of tf . We could do this by applying a moving average 
filter, replacing each value for A/,, with a weighted average of its neighbors. However, since 
there are outliers, we employ a median average filter. Thus we replace each value of A/, 

with the median value of itself and its M nearest neighbors. Typical value for M will be at 
least 5 points, or as many as 20 for the data sets we consider. [Figure 2.] Figure 2 shows the 
filtered results for a 5 point median filter. We see how the outliers are removed by the median 
filter. 

We now have a set of values At? and tf ,which are the median filtered values. We now obtain 



We now have N pairs of values, tf jf Kf . It is these pairs that are the retention time map. The 

retention time map is then described as a point to point look-up table, which is described by these 
paired values. The retention time map is obtained from a subset of entities. The map is central to 
the determination of the reference retention time for all entities in B. 

Given this map, we assign reference retention times as follows. 
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The entities in B then fall into two categories, those that are part of the look-up table, and those 
that are not. 

9) For the entities in B that are part of the look up table, the reference retention time is simply 
t i ^ as given above. 

10) For the entities that are not part of the look up table, we make the assignment based upon a 
linear interpolation: 



where t M >t k >/,. , and were the subscript / and / + 1 specifies entries in the retention time 

map, i.e.,g in the look-up table. The entities specified by the subscript k are entities not in the 
look-up table. It is this equation that specifies how reference retention times are found, and is the 
central object of this invention. 

11) For each entity in A, it is convenient to define its reference retention time is simply its 

retention time. Thus tf* = tf , for all entities in A. 

We have now assigned reference retention times for each entity in A and in B. The above 
procedure has removed the retention time offset between entities. 

12) Given an injection C, we repeat the steps above, replacing B with C to obtain tf* 
We can now track all entities between A and B by applying the following method. 

13) We choose a fine retention time threshold, At f This fine reference retention time can be 

obtained from histogram techniques that make use of the reference retention times of entities 
in B that are not part of the map. Typically, the fine value for retention time can be 0.4 
minutes. Thus, we have reduced the retention time threshold from 5 to 0.4 minutes by this 
method. This in turn has the effect of reducing or eliminating ambiguities in comparing entities 
having the same molecular weight. 
[Figure 3] Figure 3 shows the values that satisfy the fine retention time criteria, 

- tf* | < At f , after the look-up table (the backbone) is established. 

We can now attempt to track ail entities in A, B, C, and all other injections. The fine offset At f is 
used for the final search. 

14) For example, we can compare all entities in B to all entities in A, and retain only those that 
meet the following tracking criteria: 



The search is over any entity (indexed by i) in injection A versus any entity (indexed by j) in 
injection B. Note that the mass window criterion is unchanged. But the retention time criterion is 
changed. We compare the reference retention times to the fine search windows. We have a 
match if both criteria are met. No intensity criteria need be applied, though one could be 
optionally applied. 

As mentioned, an additional injection can be accommodated by comparing all entities in C to all 
entities in A, and retain only those that meet the following criteria: 






f«\<At 



m L i \<Am 
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or by comparing all entities in C to all entities in B, and retain only those that meet the following 
criteria: 



/wf'-/wf < A/77 

L<V -t Brvf \ < At 
\i j f 



Note that even though A is used as the common target for the retention time reference 
computation, reference retention times can be then compared between any two injections, such 
as between C and B. Thus a completely symmetric comparison for the purpose of entity tracking 
is provided by this method. 

Entities can then be tracked as follows: two entities are the same if the have the same molecular 
weight (within a prior specified error) and if they have the same reference retention time (to within 
a prior specified error). The errors can be determined from within the properties of the data. 

Traditionally, tracking is done by including internal standards to align retention times. These 
internal standards are chemical entities of known mass and injected at known concentration. The 
method proposed here does not require such internal standards. The method disclosed here 
does not have to know, a priori, which entities appear with unique masses. In effect the exact 
mass measurements allow us to use the subset of those entities that define the map to each act 
as a local retention time standard. 

The invention is a method to define and specify reference retention times for entities. The uses of 
this invention are entity tracking between injections. The use of entity tracking then allows the 
analyst to quantify or track relative changes in concentration of entities between samples in a 
sample set. 

The assignment of reference retention time requires that there be a coarse and a fine retention 
time threshold. The coarse threshold is the not to exceed. The fine threshold is the variation 
about zero. All unique mass hits at high SNR are found within the coarse threshold. 

Applications of quantitation 

Once an entity is tracked from injection to injection, the quantitative changes in concentration of 
the entity between samples can be measured. The quantitative response is the response as 
measured by the LC/MS system for the ion or set of ions that define an entity. 

For example consider an experiment that consists of N replicate injections for each of M samples. 
The mean, median, standard deviation coefficient of variation can be obtained for mass, intensity 
and retention time for all entities tracked within each subset of N replicate injections. The mean of 
these quantities can be similarly tracked for each entity between the M samples. 

The response of each entity as a function of sample can be input to standard statistical analysis 
such as SIMCA (Umetrics, Sweeden), or Pirouette (Infometrix, Woodenville, Washington, USA) 
can take as input the list of tracked entities produced by this method and reveal changes in 
entities between sample populations. The SIMCA and Pirouette packages and other software 
systems provide principle component analysis, or factor analysis techniques that can be applied 
to these data. 

In particular, tryptic peptides that are digestion fragments of a common protein will have their 
intensities change in concert from sample to sample. Consider the following: one sample or set of 
samples contains a protein that is expressed at one level, and another sample or set of samples 
contain the same protein but now is expressed at a different concentration level. If tryptic 
digestion is performed, then the concentration of the tryptic peptides associated with that protein 
will scale from one sample to another. That is, the concentration patter will form one distinct 
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pattern in one sample, and will from a similar pattern in another sample, but with intensities 
values scaled overall to be larger or smaller, in response to a larger or smaller concentration of 
the parent protein. 

Such correlated changes in concentration can be readily seen by factor analysis methods or by 
methods based on principle component analysis. Such a method can be used to identify the 
parent proteins whose concentration, or expression level, has changed from sample to sample. 
That is if a set of peptides produce a distinctive signature in a PCA plot, if those peptides point by 
to a common parent protein, then the protein whose expression level has change has been 
identified. 

A definitive identification can be made by taking the exact mass of these associated peptides (the 
ones that change in concert) and identifying them using standard peptide fingerprinting software, 
such a provided by Peptide mass fingerprint software offered by www.matrixsciences.com or 
prospector.ucsf.edu 
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5. What problem(s), if any, now known to you does the invention solve or attempt to solve? 

The problem the invention solves is the tracking of chemical entities in a complex mixture from injection to 
injection. Given this ability to track entities, one can then track and trend changes in the responses, or 
intensities, and, ultimately concentrations of each entity. The ability to track concentration is key to 
discovering changes that may occur between control and unknown samples. 

6. Describe any now envisioned commercial applications for the invention: 

This method may be employed in the Ion mapping software product (under development) and in the 
MarkerLynx product that is now commercialized. Additional products may be Waters Empower software 
or future software products. 



7. Conception and Disclosure 

a. Date you presently believe the invention was first conceived: June 20 th , 2003. 

b. Date and form of what you presently believe is the first written description of invention: 
October 23 rd 2003 

c. Date and circumstances of what you presently believe is the first oral disclosure of the 
invention to another: July 2003 

d. Identify all publications or other documents that you presently believe disclosure or 
describe the invention: None 

e. Has a model or prototype been constructed? Yes_X_ No 

e. Dates model/prototype were commenced and completed: July 15, 2003 

8. Testing and Reduction to Practice 

a. Date of first test: July 15, 2003 

b. Witness(es), if any: Scott Geromanos, Tim Riley, Jeff Silva 

c. Date and description of what you presently believe was the first reduction to practice of 
the invention: July 15, 2003 
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9. First Sale of Offer For Sale of a Product or Process Embodying the Invention 
Not applicable. 

a. Date and circumstances of what you presently believe was the first offer for sale: 

b. Date and circumstances of what you presently believe was the first sale: 

c. Date and circumstances of what, you presently believe was the first public use and/or 
demonstration of invention: 

d. If the invention has not yet been used, date and circumstances of any presently planned 
public use or demonstration: 

10. Identify any prior publications or patents or other work now known to you that, in your present 
opinion, relate to the subject matter of the invention. 

11. A Third Party Rights 

a. If any work relating to the invention was performed under a contract of funding 
arrangement with any governmental agency, please attach a copy of any documents 
relating to that contract of funding arrangement. 

b. If any work relating to the invention was performed under an employment or consulting 
agreement, please attach a copy of such agreement. 

12. Attach copies of the following materials, where applicable, relating to the invention and its salient 
Features: photographs, engineering notebook excerpts, blueprints, videotapes, test results. 



Inventor Signature Date 



Inventor Signature Date 



Witnessed by 



Witness Signature Date 



Witness Name 



Witness Address 
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FIGURE 1a. The results of the coarse search. The horizontal axis is retention time. The vertical axis is 
difference in retention time. Points are plotted only if entities have masses the agree within in mass 
window of 0.020 amu and have retention time differences that are within +/-5 minutes. This +/-5 mintues 
is the coarse retention time search. 
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FIGURE 1b. The results of the coarse search. The horizontal axis is expanded retention time axis to how 
concentrated the matched pairs are on the vertical axis. This concentration reveals the backbone which is 
the basis of the retention time map.. The vertical axis is difference in retention time. Points are plotted 
only if entities have masses the agree within in mass window of 0.020 amu and have retention time 
differences that are within +/-5 minutes. This +/-5 mintues is the coarse retention time search. 
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Figure 2a. After backbone selection by median filter and application of the fine retention time criteria. 
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Figure 3, example of application of fine retention time. The solid blue line shows the fine retention time plotted 
versus retention time. Notice the fine retention time is +/- 0.4 minutes in this example. +■/- 0.4 minutes is smaller 
than +/-5 minutes used in the coarse search. The backbone line is not drawn, but it is easily inferred from the limits 
of the fine search. The backbone is the blue line that would lie midway between the upper and lower limits of the 
fine search, drawn above. 



Marc Gorenstein et. al. 



Page 16 of 16 



2/13/2004 



Part II 




CN 




CO 

c 
o 



o 

o 

O 



0 

0 



c 
o 

o 

E 

10 



H 

■E £ 
E 5 

^/) o 
o3 O 

2 £ 

o .</> 

o ,0 
OS 

a 

o .. 
S2l 

0 3 

m O 
^ U 

S 0 



CO 

0 
o 

0 
Q. 



CD 

c o 

00 £3 

d 2 
^ o 

CO O 

S .52 

is 

o c 

O 0 

51 

IS 



o 

CN 

a 

<0 

•E CD = 



C 

o 
o 

0 



c 
o 

0 
o 

< 



CD 
O 

< 



o c 

CD -^3 
O 
0 

'c 



'c 

1— 
0 



0 
O 



O 

.2 6 

■o CO 
05 

CO 

0 



0 
c 

0 

0 :b 

■° o 

0 

E co 
~" 0 

.E 
S E 



IT) 

"D -g 

'o "o 
< < 

_o_ 0 

"E "E 
o o 



sP so 



O O 

II II (/) 

< 00 C 

E E O 

00 +-> 

to 00 ^ 

CO CO 



c c 
0 0 
_> > 

o o 

CO CO 



o 

O 



^ o 

CN 03 



i 



3 

o 



CD S 

CO 
CO 
03 

E" 
E 

Q_ 
Q_ 
LO 

I 



o 

I 

Q> 

S 0 

si- 
ts 

to ^ 

is 

Si 

O g 



C 

o 
o 
0 

CO 

00 J£ 

■ 0 
ol 

0 J= 

E U 

s °° 

■^3 00 

CO ™ 

■5 II 

g" LU 

CD O 

"f? co 

4J "p 0 

'5 CO c 

s «- 0 

CO .c 

Uj ^ 

^ 0 ^ 

^ "o -o 

Uj o c 

3 E 

0 p II 

•5 c LU 

CD " > 

co £ o 

03 03 — 

^ C 0 

1— r~ 

03 "03 £ 

"O O 

(D ■- -° 



0 



00 



0 
> 

O 

"D 

0 
> 

O 
0 

O 

o 

CO 
0 

<i 
0 

■4-» 

0 

Q 



CO 

c 
o 
o 
0 

CO 

10 



o 
o 

0 

CO 

CO 

0 

0 
0 

"D 

c 
0 
o 

(0 



o 

CJ) 



lL o 

0 10 

■4—' 

c 0 

— CO 

0 c 

^ 0 



c 
o 

rnmmrn 

CO 
Q 
CD 

a! 

Q) 
Q 

E 



03 
0 

CO 

> 

0 
o 

CO 

- w S 

p CO 0) 
CO 



E 

CD 

E 



CD > 
J= CD 
CO 



0 

■° 0 

CO 
TO 

> g O 



0 

CO 

0- o 



CO i? 

O CL 

E Q.CO 

8| 

>% CO 



0 

> 

o 



_Q CO LU CO CD 



0 
O 

4— 

03 
CD 



(O 
0 



X 

E 

03 
O 

E 

CT 
0 



0 

CO 

Q. 

0 
s— 

CL 
"O 

c 

CO 

"D 

0 

"go 

0 
"O 

"co 
o 



(0 

a> 
o 
E 

Q. 



CO 
O 

■ ■Ml 

a 



o 

Q. 
JO 



(0 
0 

E + .5? 

3 T3 



0 
(0 

c 

(0 

E 

3 



<0 

o 

■ MM 

Q. 



o 
o 

CL 

U) 

c 

mw^m 

a 
a 

c 
o 

E 
o 

t/> 

a 

C/) 

o 



CD 

c 

Q. 
Q_ 
CD 



O 

..CO 

CD 

i 



/IV 






(/) 




)' ■ : ";r 


aj 






JQ 






(0 






















o 




S d co" 


o 


Protei 


(1 M Rro 
Pep_mh 
b & y ion 


hydroph( 



in 



C- 

■ MM 

Q. 



c 
o 




a> 
E 

H 

Q) 

c 



5 

CO 

LU 
I 

LJJ 

-J 

LL 

o 
t: 
a 

■ ■ 

O) 
O 

o 



o 

>» 
a> 




A}!SU9}U| 



Q. 
CO 



CO 

co 

0 

O 

O 
i— 

Q. 

3 
0 
Q 



0 



CD = 
_C 03 

£ <o 

-t— > 
"D C 
0 CD 
CO c 
=3 O 

co 9- 

£ o 
^ o 

op 



O CD 

2 .1 

CD 

C C 

CD O 

□ e 

CO (/) 

^ E 

CD ^ 

2" 2 

"c 3 

— ' o 

0 03 



I— o 

03 ~ CO 

X O .© 

0 CD "O 

Q--55 3 

< "0 CO 



CO 

c 

c 
o 

Q_ 

E 
o 
o 

CD 

E 



c 

CD 
-♦-» 

L_ 
I 

CO 
CO 
CD 

E 

CD 



2 -o 

=3 C 
O 03 



O 



CO 



2 "8 

O W 

C "D 

o +- 

* - c 

"D CD 

CD ' 



CO 



0 



co "O 

CO -ft 

o 0 

2 0 



0 
> 



2 
0 



to 



0 



CO 

0 

= O) 

0 0 

cr o 
0 

^_ Q. 

CD 

s § 

E I 

JC o 
+± co 

o g 

"3 £ CO 

_c — o 
O o 



CO -+-T 
0 CO 
■O =3 



0 
0 

CO ^ 

$■■§.§> 

0 0 0 
~° o >, 
0 0) -5 

CO 0 -3 
0 "O c 



"D 0 
0 - 

CO 



"D 

0 
> 



S = 8 

a J 8 

bb! ^t= co 

X ^ 0 

99 0 ^ 

-g g 



CO "D 

£ 0 
O ■- 

cr 

0 
co »- 

0 co 

co 0 



CO 

0 



CO 

c 
o 



■3 >* 

co 0 

c c 

g 0 

o o 



"D 
0 



cr 

2 

co 
0 

CO 



& 

0 
c 
0 

CD 



£ *s 0 ^ 



c 
0 

0 



>> 
o 

2 

-i=- o 
.E 0 

-Q CO 

CD co 
CO 0 

= E 



0 

o 
0 



0 

'i_ 

0 

■4— » 

"i_ 

o 

o 

0 
0 

CO 

o 

LJJ 

CO 
Q. 



"D 
0 

CO 

0 
0 

E 

0 
sz 



0 0 

0 o o 

o '0 
2 o 

13 0 
SZ O SZ 
-t-> O " l— ' 

IS? 
E a E 

a E Q. 

O _ LO 

t— ^ CM 
^ CD 

1 i_ 1 

+ CO + 

c 0 c 
.-t± E •- 

£ © £ 

0 sz 0 

-Q I — -Q 



■o 

0 

<i 
O 

0 
CO 

0 

£ 

0 

D) 
0 
> 
0 
0 

o 

"O 

0 

CO 
CO 



0 
c 
o 

I— 

o 

0 
N 



■o 
0 



cr 
0 

CO • 

0 c 
£ .2 

o § 

0 ^ 
V E> 

CD V 
sz C 

■E o 
■0 — 
_2 0 

0 •£ 

O 0 

-4— < 

K 0 

CO >, 

zi 0 

8 § 

ll 

0 c 

E ® 

E » 

1 6 
E « 

< -2 



CO 

0 



0 
E 



0 

N 
CO 

J* 
o 

■ ■■■■ 

CO 



t 

iS 
Q 



CD 
CO 



o 



(0 

o 

o. 
< 

c 

■ ■■■■ 

Q. 

Q. 
05 



CD 
CD 



CO 

c: 

CD 



CO 

c 

CD 



CD 
CD 



O 
> 

O 

o 

0 

o 

CL 

o 

^ o 

CL i2 

fc c 

^ 03 
CO 

CD 0 

"O D) 

Q_ CO 
0 -g 



0 

E 

! 

CO 

o 

B nam 

"O 

0 
o 

"O 

0 — ^ 

cc £ -q 



CL LO 

o 

T- + 

0 "O c 

CL g" CD 

0 £ E 

Q- cl p 

O E £ 
LU 



I I 




<D 

■q 

■ mm 

Q. 

Q> 

DL 



(AjjsuajuOotBol 



o 
c 

<D 
CT 



X 

a 



CO 

G 




uia (9WLU9U x dLj|A|) e m 
S9pnd9d p j^qiunN 




CO 

G 

CM 



uia (eujii;ay xdi]L/\j) elm 
sepudad jo jaquun|\j 



oo 



if) 



O 



o 

CD 



O 





CD 
O) 

CO 

O d 
~ o 

>^ 

CO c 

c ■- 

CD "g 

"J^ CD 

CD to 

E * 



CD 



Q_ 

CD 
CL 



r- 0 



3 

a 



i 

Q) L_ 

£? "as 

co CL 



I 



I 



c 

■ H 

^ _ 

I I 

a> " 



CD 
CO 



E 

c o 
2> o 
c * — 



o 

03 



E 

0) 



o! TO 

■C CL 

^ «« 



CM 

I 

CD 
D) 

CO 
O 



(0 

c 



CM 

I 

E 



CM. 



CM 
C 

o 

■ ■■■■ 

o 
o 



CD 



i e 

cm ^ ° 

p + V 

v E S 5 

N t r 

E 2 o 

iS iS iS 
3" 75 "55 "a5 

(/) 73 "O "O 
<D 
"D 

Q. 

a> 
o 

5 



o 

(0 



N _ 

O O 

O £ 

3 = 

-. 0 



O 



O 
CD 



=5 0 

-s| >< 
O 

o s=L 

O O 
CO 



CO 



CO 



CD 

5T 

0 



3 

< 

CO 
CD 

H 

3 
0 



0 8 § 



=5 CD 

^- CO 

3 Q) 

0 q 

1 3 

CD Q) 

o 



CQ 



0 



0 



0 



CO 



CO 

o 
3 

Q) 
CO 
CO 

0 
CO 

o 
o 

o 
o 

CO 

0 



CO 



o 
3 




0) 
O 

CD 



c 
o 

0) OJ 

oB 

*5 

Eo 

2 < 

0) 0 
CL O) 

4 |2 

"D 
C <D 

0) O 
£ TO 

O CO 
TO <D 

0) Cl 

fc Q. 

fc 0) 

CD 



(UjW )auljlJ9^Bi|aa 



o my 




51111 



co 

CM 




<p 
c 
o 

-Q 
^1 
O 

nj 
CD 

| 

CD 
C£ 

O 

u_ 

(0 
s_ 

CD 
-♦— « 

o 

0) 



CO 

d) 

> 
o 

CD 
(Z 
i_ 
CD 

c 

05 
0 



CD 



a) 
E 

b 



CO 



_jJo 



(U|UU )9tU!iJ9yB}|9Q 




m cn cn o »7 (N ri ^ ui | 



.. ... :.vV:.* 



C 

o 
n 

o 
DO 

0 



c 
o 

< 

*i_ 

O 

n 

o 



13 
CD 




(UjOJ )9Ui!B9MB)|9Q 




111 

i£ CD 

h 



•IP ^ 

S (V 



■ill 




I 

SllfC 

in"* 



CO 



c 

o 

IBM 

O 

Q> 



O 

<D 



■ ■ 

Q. 

£ 

CO 

c 

o 



-»-« C/) 
L_ CD 



co 



CO 



o 
E 

0 



3 
O 
O 
CO 

c 
o 



O ~° 

03 

O -Q 

■Q E 
0) t 

E ^ 



(D — O 

0 o 

5 03 



0 



CO 



^.2 



CO 

0 



c 
o 



0 

* CO 

■o £ 

.E 1 

LL O 



CO 
(0 

Q. 



0 
E 
E 

c^.y 

0 Q. 

OH 

■ ■ ■ ■ 

CQ O 

(0 (0 
(0 (0 
03 03 

0. 0. 



CO 



CO 
CO 



CO 

II 


CO 


CO 


CO 




n[2] 


n[3] 


CM 
II 


, — , 
CM 


1 1 

CM 


C\^ 




'c 


n[2; 


n[3; 


II 








j 1 


CM 

1 1 


CO 




'c 


C 


c 






CM 


CO 




II 


II 


II 











CO 
CM 

CM ^ 
* C 

^ o 

to — CO 



c 
o 



CO 

c 



c 
o 



o 

0 O 



o 

"5. 

0 



0 



o 

CO 

to 

0) c- 
"O 



o 



0 

E 
3 



to 

CD 

"q_ 

<D 
Q_ 



(/) CO 
CD "5 



Q. 

CD _-o 

Q. ~ h_ 

- Q- O 

CO <D i_ 

Q_ (D 

M— -Q 

2 I 

E ~ 



*" Q_ 

^ 0 

- o: 



CO CO CO 
CM C\T cT 



£2 £2 
+ + 

+ + f2j 

^ ^ ^ " CN C\T 



c c c 
+ + + 

CO CO^ CO 

c c c 
+ + + 

C\T CN? CO 

" s ^ 

+ + + 

■r- 1 c\T 00 



c 
II 



c c 

II II 

C\T CO 1 



II II II 

Ei S S 

Q_ Q. Q_ 
CD CD 0 
Ql (Z DC 

I I I 



* 

CO 



CO 

+ 

+ 

c\i 



£2, 

Q_ 

0 
a: 



* 

CM 



CO 

00 c 
2^ + 

+ s 



If 

+ =r 



* 

CO 



Q_ 

0 



CM 



Q_ 

0 



II II II 



t 1 1 1 r 




ing e in ssp^dad jo jsqwriN 




dLjLU 



ID 

8 



Q. 




CD 
















O 






ii 


Inje 


A 


CD 
O) 


CO 


CD 


o 


Ch 




g 




CO 


Wi 


t 




Q) 




cc 




CO 

I 








CN 

n 



en 



S 



cn 



CO 



8 



S /rV. 



in- 

8 



; E 



U); 

CD* 



CD 



CO 



•;.<-..' rV^'-.::-;.V : -'.> : - 



CN!' 

8: 



i 



dt|W 






CO 




* 






-i 

■ . \ 


O 




- : | 

•" f 1 


nject 




:• 1 


A 


, ;., J 




CD 


• 1 

? " l 


CO 


o> 


• :. 1 
v - 1 


<+— 

o 


CO 








• ! 


on 


6 


' V.'-: 






■ ■ y 1 






ii 

■•■r; . | 


o 








CD 




•' ; t-f 


3R 





Q -*r» 



CO 

CO: 



wo 

co ■ 
CO* r 



v. 



CO: . 

uv 

con;..- 
CO 




o 

CO : 

CO 

CO 



CM 



o 
0 



o 

(0 
LU 

O 

"55 
c 
0 



C 

(0 

(0 
0 

a 

o 

Q. 



.a 
E 



'(0 

c 
c 



O (/) 

0 T3 

5 <D 

-2 Q- 



c 
o 

■ MB 

o 



o 
o 
+ 

0 
CO 
CD 

CD 



o 
o 
+ 

0 
CD 
CO 

CD 



o 
o 
+ 

CD 
CD 

in 

CD 



CO 
O 
O 
+ 
0 

o 

CN 



o 
o 
+ 

0 
O 

CD 



CD 
CN 



CO 
CN 



CD 
CO 
CO 
1^ 



1^ 

CD 

00 



CD 
O 

CO 
CNJ 



CN 
O 
1^ 



CD 

co 



CN 



CO 



o 



0) 

D) 

A3 

> 

< 



0 

O) 

is 

o < 

3 

CO 



c 
o 

■ Hi 

O 

■ ■■■ 

O 

O 
LU 
O 

'55 
c 

0) 



c 

</) 
0 

■ ■■■1 

a 

CD 
0- 



0) 

E 

3 



% of Total 

Peptide 
Intensity 


91.52% 


4.27% 


4.21%, 




100% 


Peptide 
Intensity 


1.84e+08 


8.57e+06 


8.47e+06 




2.01e+08 


f Total 
rtides 


CO 
lO 


■ 


sO 

CM 




O 

o 


o g- 
^ °- 


CO 




CO 
CM 






Number of 
Peptides 

i 


3 X 4356 


2 X 1984 


1 X 6070 




23106 


Replication 
X out of 3 


00 


\ 

CM 






Total 



0 

'to 
o 

mmm 

0) 

c 
o 

o 

CD 

c 

0 



CD 
O 



Q) 

IS 
O 

■ ■■■■ 

c 



3 



> 



II 

X 
<D 
73 

cs 

a 
o 

• l-H 
+-> 

o 



73 



ii 

O 

I 



■ L I 



C/3 

d 
o 

-t-> 

o 



73 

o 



B 
c 



1 r 1° 



r— i . 

S i 



in 



O 

a 
o 

• i— « 
+-> 

cd 

> 

73 
73 

+- > 



I 



1/3 

o 
o 



73 

o 

'Eh 

J-i 

o 

• I-H 

CJ 



o 
o 

• 1— ( 

> 
o- 
c 
o 

«« 

<D 
O 

o 



"2 

to 

\ 



o 



B 

<g 

o 
t3 

O 

1 

o 

a 
o 
o 

? 
1 



0) 



Or- 







1 


1 











3 



> 

O 



I s - 

q 
co 



0 s 

CD 



CO 

o 

+■» 
3 
O 

>< 

c 
o 

o 

or 



CO 



CM 




uv 




to 

CO 


CO 


CO 

s 


CO 

$ 

CM 




dijw 





s 



s 
8 



0) 

a 
E 

CO 

CO 

o 

E 
o 



c 

0) 



c 

(0 

a> 

T3 

■ MB 

+■» 

a 
o 

Q. 



O 
-Q 

E 

3 



J2 

CM J5 
* CD 

I! 
CO a 

m 
cm" 



CM (5 

J2 0) 

£ E 

CO a 
CM 



© 15 

o. a) 

E o 
(o E 
co a 

m 



o (5 

a. a> 

E 1 
co E 

co a 
m 



CD & 

2 "55 

§■ i 

Q_ C 



O (/) 

O "O 

£ +3 

E QL 



2 "55 



O (/) 

l. <D 

n 

E a 

E Q) 

5 Q- 



c 
o 

■ MM 

o 



X 



X 



X 



CD 
O 
+ 
0) 
ID 
00 

■ 



CD 
O 
+ 

00 



CD 
O 
+ 
CD 
O 

m 



CD 

oo 

CN 



00 
CM 



1^- 
o 
+ 

CD 
CD 
00 



o 
+ 

CD 
CD 

cr> 



1^ 
o 
+ 

CD 
CD 



X 



CD 
00 



00 
CD 



00 
00 



X 



CN 



00 



ID 



CD 



1^ 
o 
+ 

CN 
00 

c\i 



CD 



1^. 
o 
+ 

00 
LO 



00 



03 

o 



CD 

o 
+ 

1^ 



00 
00 
00 
CM 



1^ 
o 
+ 

5 



o 
^1- 



CD 

CD 

03 
1— 

CD 

> 

< 



CNJ 

in 



00 
00 



LO 

o 

LO 



00 



ID 
O 



> 

O 



c 
o 

o 

■ ■■■■ 

Q. 

CD 

o 

05 
LU 

O 
> 

"55 
C 
4) 

C 

"O 
C 

(0 

(0 

0) 

■ wmm 

o. 

Q. 



CJ) 

E 



5 
o 
i- 



a> 2? 
!2 w 



<D 



Q. C 



CO 
00 



CN 
CO 



O 



c\i 



0 s 



cr 
00 
CO 



O 



a> 

Q. 

a> 
a. 



'55 
c 



o 
o 
+ 

Q) 
N. 

CO 
CM 



CD 
O 
O 
+ 
0 
CO 
CO 
LO 

CNJ 



CD 
O 
O 
+ 
CD 

CO 



CD 
O 
O 
+ 
0 

CD 
00 



CD 
O 
O 
+ 
CD 

LO 
CN 



CD 
O 
O 
+ 
CD 



00 
O 
+ 

a) 
o 

CN 



W (/)_ 

o §" 



00 
CM 

lO 
CO 



^5 

O 
Gi 

00 



0 s 

CO 
CD 



O 
CN 

LO 



(J) 
CN 



O 
00 



O (/) 

JQ ^ 

£ a 

3 0- 



O 
CM 

X 

CO 



CD 
CD 
CO 

X 

LO 



LO 

CJ) 

X 



CN 
O 

X 

CO 



LO 

CO 
CO 

X 

CN 



CN 
O 
h- 
C0 

X 

CN 



CN 
CD 
LO 
O 
CN 



13 o 

a o 

a> v> 



CO 



LO 



CO 



CM 



5 
O 



Q. 

E 

CO 

</> 

o 

CO 
HI 



o 

CO ^ 

- 5 

< (Q 

E .2 

2 a 

>. 0) 

8 </> 

0) — 

= c 

0 .2 

S o 

1 .2, 
J.E 

- a> 



© 0 
2 O 

< c/> 

^ 0) 

-.2 

O .£ 



CNJ 
0 
CL 

E 

03 
CO 



Q) 
CL 

E 

03 
CO 



> 

< 



I 



> 
o 



I 



I 



> 
o 



o & 
o 



q 

CO 



SO 

CNJ 
00 



CO 



</> 

a> 

a 

E 

as 
O 

c 

<D 

a> 
o 
c 

■ MB 

"(0 
C 
0) 

c 

■ 

IS 

i 



ID 

10 



O 

a 
E 

o 



c 

<D 
-Q 



a: 

"35 
c 



CM 



00 



• • •• • •*• r* f>' • 

• • # 9 

_ ■ Hi ■ * 

• • • 



CO 



CM 



CD 



CM 



a 
E 



in 



© 
Q. 

E 

(/) 
o 

o 
o 



c 

0 




O) 

o 



Q. 

E 

</) 
o 

o 

"5 
c 
o 



o 

O) 

2 
> 

< 



ofley A;;sua)U| 




ojjea A);sue;u| 




junoo 



C:\Documents and Settings\lig\My Documen. 
February 13, 2004 ■ 



. \track3clusters006 .m Page 1 

1:42:15 PM 



% track three clusters 

% Track 3 clusters found from- runPeptideXX 
% MVG June 2003 

clc 

delwin 

clear all, pack 



% load stuff 

disp('* Track 3 cluster * 

disp('* Retention time track * 

disp('* three peptide clusters * 

disp ( 1 **************************************** 
dispC ') 



% load stuff 



useDefault = dinput ('Use default inputs', 1); 
if useDefault 

f ileNameStkClstrA = ' 031403_JcsProtMix02_04Stk_01Pep ' ; 
f ileNameStkClstrB = • 031403_JcsProtMix02_05Stk_01Pep ' ; 
f ileNameStkClstrC = ' 031403_JcsProtMix02_06Stk_01Pep ' ; 

else 

f ileNameStkClstrA = uigetf ile ('* .mat Pick Pep File A') 
f ileNameStkClstrB = uigetf ile ('* .mat ',' Pick Pep File B') 
f ileNameStkClstrC = uigetf ile ('* .mat 1 , ' Pick Pep File C) 
end 

[ind0Cluster_A, indNextIso_A / numZCluster_A, numIons_A, ... 

m_zStk_A, retTimeStk_A, responseSNR_A, indPepForClstr_A] = getCluster ( f ileNameSt * 

kClstrA) ; 

[indOCluster_B, indNextIsb_B, numZCluster_B, numIons_B, ... 

m_zStk_B, retTimeStk_B, responseSNR_B, indPepForClstr_B] = getCluster ( f ileNameSt kT 

kClstrB) ; 

[ indOCluster__C , indNextIso_C , numZCluster_C , numIons_C, . . . 

m_zStk_C, retTimeStk_C, responseSNR_C , indPepForClstr_C] = getCluster ( f ileNameSt 

kClstrC); 

indHitPep_B = zeros ( 1 , max ( indPepForClstr_B) ) ; 
indTestPep_B = zeros ( 1 , max ( indPepForClstr_B) ) ; 

retTimeOf f set = dinput (' Maximum retention time offset (min)',l); 
mzOffset = dinput (' Maximum m/z offset (amu) 1 , 0 . 02 ) ; 

stopForPlots = dinput ('Stop for all plots? ',0); 
stopForMissedTriple = dinput ( 1 Stop for missed Triples? 1 , 0 ) ; 
SNRLimit = dinput (' SNR Limit to quit ',10); 

minRetTime_A = min ( retTimeStk_A) ; 
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retTimeHitB 
nuzHitB 
respHitB 
numZHitB 



retTimeClstr (1) ; 
m_zClstr (1) ; 
responseClstr (1) ; 
numZCluster_B(indCluster) ; 



i f ( respHi t B< SNRLimi t ) 

break 
end 

indTestPep_B ( indPepForClstr_B ( indCluster ) ) =1 ; 
% Find hits to A 

% save rtLimitsAB sortAB upperRTLimit lowerRTLimit 
load rtLimitsAB 



retTimeO_A 

m_zO_A 

respO_A 

rtBool 

rtBoolUpper 

( 1 ) ) ; 

rtBoolLower 

( 1 ) ) ; 

muBool 

respBool 

zBool 

%indHit_A 
indHit^A 

retTimeHitA 
m_zHitA 



= retTimeStk_A(indOCluster_A+l) ; 

= m_zStk_A(indOCluster_A+l) ; 

= responseSNR_A(indOCluster_A+l) ; 

= abs (retTimeO_A - retTimeClstr ( 1 ) ) < retTimeOf f set ; 

= retTimeO_A - retTimeClstr ( 1 ) < interpl ( sortAB, upperRTLimit , retTimeClstr * 

= retTimeO_A - retTimeClstr ( 1 ) > interpl (sortAB, lowerRTLimit , retTimeClstr * 

= abs(m_zO_A - m_zClstr(l)) < mzOffset; 

= abs(loglO(responseClstr(l) ./ respO_A) ) < logl0(2); 

= numZCluster_A = = numZHitB; 

= find(rtBool & muBool & respBool & zBool) ; 

= find (rtBoolUpper & rtBoolLower & muBool & respBool & zBool) ; 

= retTimeStk_A(indOCluster_A(indHit_A) +1) ; 

= m_zStk_A(indOCluster_A(indHit_A) +1) ; 



% Find hits to C 

% save rtLimitsAB sortAB upperRTLimit lowerRTLimit 
load rtLimtsCB 



% Find hits to C 



retTimeO_C 

m_zO_C 

respO_C 

rtBool 

rtBoolUpper 

(1) ) ; 

rtBoolLower 

( 1 ) ) ; 

muBool 

respBool 

zBool 



= retTimeStk_C (indOCluster_C+l) ; 

= m_zStk_C (indOCluster_C+l) ; 

= responseSNR_C ( indOCluster_C+l ) ; 

= abs (retTimeO_C - retTimeClstr (1) ) < retTimeOf f set ; 

= retTimeO_C - retTimeClstr ( 1 ) < interpl (sortCB, upperRTLimit /retTimeClstr 

- retTimeO_C - retTimeClstr ( 1 ) > interpl ( sortCB, lowerRTLimit , retTimeClstr 



abs(m_zO_C - m_zClstr(l)) < mzOffset; 
abs (loglO (responseClstr (1) ./ respO_C) ) 
numZCluster_C == numZHitB; 



< logl0(2) ; 



indHit_C = findlrtBool & muBool & respBool & zBool); 

indHit_C - find (rcBooJ Upper t> rtBoolLowe: o mufiocj b, respBoc: o zBooi ) 
%indHit_C = find (rtBool & muBooi c* respBooi ) ; 
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[sortCB, indCB] =sort (rtCB_B) ; 
deltaCB = deltaCB ( indCB) ; 



[sortABC_B, indSort] = sort ( rtABC_B) ; 
deltaABC_A = r t ABC_A-rtABC_B ; 

deltaABC_A = deltaABC_A ( indSort ) ; 



deltaABC_C 
deltaABC_C 



r t ABC_C - r t ABC_B ; 
deltaABC_C(indSort) ; 



retTDelta = 0.2; 

medianWidth = 5; 

figure (3) 

subplot (3, 1,1) 

plot (sortAB, del taAB, 'x-') 

upperRTLimit = medianFilter (deltaAB, medianWidth) +retTDelta; 
lowerRTLimit = medianFilter (deltaAB, medianWidth) -retTDelta; 
indOverUnder= find(deltaAB > upperRTLimit | deltaAB < lowerRTLimit); 

hold on 

plot ( sortAB , upperRTLimit , ' k- ' ) 
plot (sortAB, lowerRTLimit , 1 k- 1 ) 

plot (sortAB (indOverUnder) , deltaAB ( indOverUnder) , ' ro ' ) 
hold off 

stitle('%d A Clusters hit %d B Clusters with SNR > %d\ ... 
[length (rtAB_B) , numExamined, SNRLimit ] ) ; 
ylabel ( 1 \DeltaT_{Ret } (min) ' ) 
grid on 
zoom on 

save rtLimitsAB sortAB upperRTLimit lowerRTLimit 



subplot (3 , 1, 2) 

plot (sortCB, del taCB, 'rx- 1 ) 

upperRTLimit = medianFilter (deltaCB,medianWidth) +retTDelta; 
lowerRTLimit = medianFilter (deltaCB,medianWidth) -retTDelta; 
indOverUnder= find(deltaCB > upperRTLimit | deltaCB < lowerRTLimit) ; 

hold on 

plot (sortCB, upperRTLimit , ' k- ' ) 
plot ( sortCB , lowerRTLimit , 1 k- ' ) 

plot (sortCB (indOverUnder) , deltaCB ( indOverUnder ) , ' ro ' ) 
hold off 

stitle('%d C Clusters hit %d B Clusters with SNR > %d ' , ... 
[ length (rtCB_B) , numExamined, SNRLimit ] ) ; 
xlabel ( 'min' ) 

ylabel ( ' \DeltaT__{Ret } (min) 1 ) 
grid on 
zoom on 

save rtLimtsCB sortCB upperRTLimit lowerRTLimit 
subplot (3,1,3) 



lot 
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bar(logl0 (binSNR) rfnumAllB' ,numHitB* ] ) 

stitle('%d A&C Clusters hit %d B Clusters. (%d A&C Peptides hit %d B Peptides) 

[ length (deltaABC_A) , numExamined, sum( indHitPep_B) , sum( indTestPep_B) ] ) ; 

xlabeK 'log 10 SNR 1 ) 

y label ( 'Number per log interval') 

legend('B clusters ',' ABC replicates') 

limits=axis; 

axis([0,limits{2:4)]) 

grid on 

zoom on 

% 

subplot (2, 1,2) 

sortRespAllB = -sort ( -respAllB) ; 
sortRespABCHit = - sort ( -respABCHit ( : , 2 ) ) ; 
normSum = sum ( sortRespAllB) ; 

plot (loglO (sortRespAllB) , 100*cumsum(sortRespAllB) /normSiam, '-') 
hold on 

plot (loglO (sortRespABCHit) , 100*cumsum{sortRespABCHit) /normSum, 1 r- ' ) 
hold off 

xlabel ( ' log 10 SNR' ) 

ylabel (' Percent cumulative response') 
title ( 'Cumulative response of clusters') 
legend('B clusters ', ' ABC replicates') 
grid on 
zoom on 

save track3save 
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% track three clusters 

% Ret time Track peptides found from runPeptideXX and from Manchester's UMLG 
% MVG July 2003 

% Copyright © 2003 Waters . Corporation 
clc 

delwin 
clear all 
pack 

disp( [ 'Executing: ' ,mf ilename] ) 

disp ( '*****************************************' ) 
dispC* *') 
disp('* Track pepetides by retention time. *') 
disp('* Inputs: 2 or more xxxPep.mat files *') 
dispC* *') 
di sp ( 1 *************** *************************** ) 

dispC ■) 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
% Input files -- each file is an injection 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
cdCurrent = cd; 
f ileNameArray= [ ] ; 
pathNameArray= [ ] ; 
numFile = 0; 
iilnj = 0; 

fileType = {'* .mat 1 f ■* .mat '-,'*. txt '} ; 

barMessage = {'Select xxxTrk3D.mat f ile ',' Select xxxPep3D.mat f ile 1 ,' Select xxxCom.txt fi n 
le'}; 

peptide = cell (2, 13); 
while 1 

dataType = menu (' Select a file type, or FINISH :',' Track3D 'Apex3D' UMLG ' FINISH 1 ) ; 

if dataType == 4 
break; 

end 

[filename, pathname] = uigetf ile ( f ileType{ dataType} , barMessage {dataType} ) ; 
if filename==0 
continue 

else 

numFile = numFile+1; 
iilnj = iilnj+l ; 

f i 1 eName Array { numFi 1 e } = [ f i 1 ename ]^ ; 
pa thNameArray { numFi le } = [ pathname ] ; 

cd (pathname) 

disp ( [ 1 Process 1 , filename] ) 

end 

if dataType == 1 



ImwHP] us Pep , retTimePep, responsePep , nuiriZPep , idCJ ust er j = c re ck 3 DGetTrk 3 D02 ( filename ) 
peptide (iilnj , i : 5) = (mwHPiusPep, recTimePep, responsePep , numZPep . idC luster } ; 
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numZPepTarget = peptide{param.iiTarget, 4} ; 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
% Control parameters 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 



f ileNameTarget = param. f ileNameArray{param. iiTarget) ; 

param. f ileNameTrack3D = dinput (' Filename of log, comp, and .mat f ile ' , f ileNameTarg y£ 

et(l: (end-4) ) ) ; 



f ilenameDiary = [param. f ileNameTrack3D, ' log. txt ' ] ; 
if exist ( f ilenameDiary) 
delete ( f ilenameDiary) 
end 

diary (f ilenameDiary) 



★★★***★*** i 



gp ^ • ******************************* 

disp ( ' * TrackNPeptides 
di sp ( 1 ****************************************** 
di sp ( mf i 1 ename ) 
disp(datestr (now) ) 

disp ( 1 ***************************************** 
disp('Files to be processed:') 
dispC ') 

for ii = 1 : param. numlnject ions 
di sp ( param . f i 1 eNameAr ray { i i } ) 
end 



' ) 



disp ( ' 

disp( ' 

disp ( ' 

param. 

param. 

param. 

param. 

param. 

param. 

param . 

param. 

param. 

rget) ) 

param. 

et) ) ; 

param. 



') 



★ ★★★★★★★★★★★★★a************************** i 



Input parameters 1 ) 

retTimeOf f setTracking 

heightRatioTracking 

SNRLimi tTracking 

SNRLimitID 

mzOf f set 

zLowerLimit 

zUpperLimit 

retTimeThresholdID 

retTimeStart 

retTimeEnd 

zThreshold = 0.5; 



dinput (' Retention time offset for tracking pass (min) ',5.0); 
= dinput ( 'Height ratio range for tracking pass ',0.4); 
= dinput ( 1 SNR Limit for tracking pass', 100); 
= dinputf'SNR Limit for identification' , 10) ; 
= dinput ( 'Maximum m/z offset (amu) ' , 0 . 02 ) ; 
= dinput (' Lower fraction charge limit' ,1.5); 
= dinput ( 'Upper fraction charge limit ', 10 . 0 ) ; 
= dinput (' RetTime Threshold for ID ( -l=auto) 1 , -1) ; 
= dinput (' Start at target retention time (min) ' , min ( retTimeTa * 

= dinput('End at target retention time (min) ' , max (retTimeTarg 



% Definition 

% param. sigmaRetTimeResid ( ii) = median (abs (retTimeResidNoZero) ) /0 . 67 ; 



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
% Normalize responses using median 
% of peptides that hit 
% 

% Construct retention time map w/r to target 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

iiPJoi = 0; 
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% Find- ret time -map from- the -ones that hit- between -target and A 
retTimeRef = retTimeA ( indHitA) ; 

retTimeDelta = retTimeA ( indHitA) - retTimeTarget (indHitTarget) 

deltaMwHPlus = mwHPlusA( indHitA) - mwHPlusTarget ( indHitTarget ) 

mwHPlusHit = mwHPlusA ( indHitA) ; 

% Sort by ret time A (why not target?) 
[retTimeRef , iSrt] = sort (retTimeRef ) ; 
retTimeDelta = retTimeDelta ( iSrt ) ; 

deltaMwHPlus = deltaMwHPlus ( iSrt) ; 

mwHPlusHit = mwHPlusHit (iSrt) ; 

% Need to delete the occasional coincidence in order to use interpl 
indDelete = find(diff (retTimeRef ) ==0 . 0) ; 
retTimeRef ( indDelete ) = [ ] 
retTimeDelta ( indDelete) = [ ] 
deltaMwHPlus ( indDelete) = [ ] 
mwHPlusHit ( indDelete) = [ ] 

% Find the backbone. 

retTimeDeltaMedian= medianFilter ( retTimeDelta medianWidth) '; 
% Obtain the residuals about the backbone 

retTimeResiduals = retTimeDelta-retTimeDeltaMedian ; 

retTimeResidNoZero = retTimeResiduals { find (retTimeResiduals~=0 . 0 ) ) ; 

% Determine threshold, in a round about way. 
if param. retTimeThresholdID< = 0 . 0 

param. sigmaRetTimeResid(ii) = median (abs (retTimeResidNoZero) ) /0 . 67 ; 

else 

param. sigmaRetTimeResid ( ii ). = param. retTimeThresholdID/4 . 0; 

end 

% Obtain cutout 

upperRTLimit = retTimeDeltaMedian+4 . 0*param. sigmaRetTimeResid ( ii ) ; 

lowerRTLimit = retTimeDeltaMedian-4 . 0*param. sigmaRetTimeResid ( ii ) ; 

indOverUnder = find (retTimeDelta > upperRTLimit | retTimeDelta < lowerRTLimit); 

indGood = f ind ( retTimeDelta < upperRTLimit | retTimeDelta > lowerRTLimit) ; 

% Store cutout 

peptide{ii,7} = retTimeRef; 

peptide {ii # 8} = upperRTLimit; 

peptide {ii, 9} = lowerRTLimit; 

% Histogram mwHPlus error 
figure (5) ^ 
deltaMwHPlusGood = deltaMwHPlus ( indGood) ; 
mwHPlusHitGood = mwHPlusHit ( indGood) ; 

ppmHit = le6*deltaMwHPlusGood. /mwHPlusHitGood; 

binVec = min (deltaMwHPlusGood) : 0 . 001 : max (deltaMwHPlusGood) ; 
binPpm = min (ppmHit ): 0 . 5 :max (ppmHit) ; 

subpJ oc (pa rem . numin jecc : ons- 1 , 2 , ? * j i PJ o\ - J 1 
hist (deltaMwHPlusGood , binVec ) 
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title (sprintf ( 'Histogram about %d pt median. Inj %d - target. %d zeros removed. Sigma 
- %6 . 3f min ' , . . . 

medianWidth, ii , length (retTimeResiduals==0 .0) ,param. sigmaRetTimeResid ( ii ) ) ) 
xlabel ( 'min 1 ) 

ylabel { 'points per 0.01 min') 
grid on 

end 

% move windows a bit for a better view. 
screenSize = get ( 0 , ' screensize 1 ) ; 
screenHalf = screenSize (3 ) /2 ; 

screenHeight = screenSize(4) ; 

set (3 , 'position' , [1 1 screenHalf screenHeight/2 ] ) 

set (4, 'position' , [screenHalf 1 screenHalf screenHeight/2 ] ) 

drawnow 



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
%%%%% Call pass B of comparison program %%%% 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

peptide = track3DPassB02 (peptide , param) ; 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

% Plot ret time maps, 

% and histogram of delta ret time 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

iiPlot = 0; 

for ii = 1 rparam.numlnjections 

% No need to change response of target. 
% No need to construct ret time map of target 
if (ii==param. iiTarget) 
continue; 

end 

iiPlot=iiPlot+l; 



responseA 

retTimeA 

mwHPlusA 



peptide {ii, 3} ; 
peptide {ii, 2} ; 
peptide {ii, 1} ; 



indHit 

indHitA 

indHitTarget 



peptide{ii, 11} 
indHit ( : , 1) ; 
indHit ( : ,2) ; 



% Find ret time map 

retTimeHitA = retTimeA (indHitA) ; 

retTimeDelta = retTimeA ( indHitA) - retTimeTarget ( indHitTarget ) ; 



I retTimeHi th, i Srr j - sore (retTimeHi cA) ; 
retTimeDelta = retTimeDelce (iSrc) ; 
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logRespRatio = loglO ( responseA ( indHitA) . /responseTarget ( indHitTarget ) ) ; 

sigmaRespRatioLog = median (abs (logRespRatio) )/0.67; 

hist (logRespRatio, 100) 
xlabel ( ' loglO (ratio) ' ) 
" ylabel ( 'Number per interval') 
stitle{'Std dev respRatios for Inj %d = %5 . 3f ' , [ii , 10 . 0 . A sigmaRespRatioLog] ) 
grid on 
zoom on 

end 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
% Find all N peptides that replicate 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

% indHitMatAll (ii, j j ) = index of peptide in injection jj that hit target peptide ii 
indHitMatAll = zeros ( length (responseTarget ) , param . numlnj ections ) ; 
for ii = 1 : param. numlnj ections 

if ( ii==param. iiTarget ) 
continue 

end 

indHit = peptide { ii , 11 } ; 

indHitA = indHit (:, 1); 

indHitTarget = indHit ( : , 2 ) ; 

indHitMatAll (indHitTarget, ii) = indHitA; 

end 

% help logical 

% LOGICAL Convert numeric values to logical . 
% 

% The term "logical indexing" refers to any indexing operation where 
% the index expression is a logical array, in which case the index is 
% treated as a mask that selects elements from the indexed array. In 
% essence, it is a short-hand notation for A ( FIND (B) ) that enables us 
% to simply write A (B) when B is a logical array. The result is the 
% elements of A at the indices where B is one. It is often convenient 
% to derive the index expression from the indexed data itself. For 
% example, the positive elements of a vector A can be obtained using 
% A(A>0) . 

indHitMatAllOmit = indHitMatAll; 
indHitMatAllOmit ( : , param. iiTarget ) = [ ] ; 

indHitLogical = indHitMatAllOmit>0 ; 

if param. numlnjections ==2 

indHitSum = indHitLogical; 

else 

indHitSum = sum ( indHitLogical 1 ) ' ; 

end 

hi tVecLocicei - (indHitSum > ( paran, . numlnj ecu i onr - j )) ; 
uoualHius = sum(hitVecLogicai) 
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numPerBinHitTarget= [ ] ; ** " ------ . — 

numTotalHitNfl} = int2str ( length ( indZTarget )) ; 
for ii = 1 : (param. numlnjections-l ) 
hitVecNLogical = (indHitSum >= ii); 

numPerBinHitTarget (ii, : ) = hist ( responseTarget ( find (hitVecNLogical )), binSNR) ; 
numTotalHitN{ii+l} = int2str ( sum (hitVecNLogical )) ; 
end 

bar (loglO (binSNR) , [numPerBinTarget ' , numPerBinHitTarget ' ] ) 

stitle('%d Peptides hit %d Target peptides', [totalHits, length (indZTarget) ]) ; 

xlabel(['SNR (loglO). Algorithm: ', param. inputDataAlgorithm, 1 . Target: ' , f ileNameTarget , ' * 

. Log File: 1 , param. f ileNameTrack3D] ) 

ylabel ( 'Number per log interval') 

legend (numTotalHitN) 

limits=axis ; 

axis ( [logLimitLower, logLimitUpper, -10, limits (4) ] ) 
grid on 
zoom on 



subplot (2,1,2) 

responseTargetSort = -sort ( -responseTarget ( indZTarget) ) ; 
normSum = sum (responseTarget ( indZTarget )) ; 

plot (loglO (responseTargetSort) , 100*cumsum(responseTargetSort) /normSum, '-' ) 

for ii = 1 : (param. numlnjections-l) 

hitVecNLogical = (indHitSum >= ii) ; 

responseTargetHit = responseTarget ( find (hitVecNLogical ) ) ; 
responseTargetHitSort = -sort ( -responseTargetHit ) ; 
hold on 

plot (loglO (responseTargetHitSort) , 100*cumsum(responseTargetHitSort) /normSum, ' r- ' ) 
hold off 

end 

xlabeK 'log 10 SNR 1 ) 

ylabel (* Percent cumulative response') 
title ( 'Cumulative response of hits') 
legend (param. f ileNameArray) 
limits=axis ; 

axis( [logLimitLower, logLimitUpper , -5, 105]) 
grid on 
zoom on 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
% Write text file output 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
compareFileName = [param. fileNameTrack3D, 'Comp.txt']; 
if exist (compareFileName) ==2 

delete (compareFileName) 
end 

disp( ['Write comparefile: ' compareFileName]) 
fid = f open (compareFileName, 'w' ) ; 

fprintf ( f id, ' ID | mwHPlus | retTime (response |fracZ'); 

fo? j ; - i : param . numln jeciiionr 
ii- j j - = param . ii Target 
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end 

di sp ( ' *** ******************* *-*-*-*-*-*-* ■*-*-*-* * *-***-*,* ** i.) 

disp ([' Finished executing: ' , mf ilename] ) 
diary off 
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peptide = track3DPassA02 (peptide, param) ; 
for ii = 1 rparam.numlnjections 



% Find ret time map from the ones that hit between target and A 
retTimeRef = retTimeA ( indHitA) ; 

retTimeDelta = retTimeA ( indHitA) - retTimeTarget ( indHitTarget ) ; 



% Sort by ret time A 

[retTimeRef , iSrt] = sort ( retTimeRef ) ; 
retTimeDelta = retTimeDelta ( iSrt ) ; 



% Find the backbone. 

retTimeDeltaMedian= medianFilter ( retTimeDelta ' , medianWidth) ' ; 



% Compute reference ret time for each entity, and store it. 

retTimePepTargetA = retTimeA - interpl (retTimeRef , retTimeDeltaMedian, retTimeA) ; 
peptide {ii, 10} = retTimePepTargetA; 



end 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
function peptide = track3DPassA02 (peptide , param) 
for ii = 1 : length (responseTarget) 



thisRetTime = retTimeTarget ( ii ) ; 

thisMwHPlus = mwHPlusTarget (ii) ; 

thisResponse = responseTarget { ii ) ; 

thisNumZ = numZTarget ( ii ) ; 



for jj =1: param. numlnjections 



indSubset = find ( abs ( retTimeA- thisRetTime ) <param . retTimeOf f se tTracking ) ; 

mwHPlusASub = mwHPlusA ( indSubset ) ; 

responseASub = responseA ( indSubset ) ; 

numZASub = numZA ( indSubset ) ; 

muBool = abs (mwHPlusASub - thisMwHPlus) /thisNumZ < param. mzOf f set ; 

respBool = abs (loglO (responseASub/ thisResponse) ) < abs ( loglO (param. heightRatio 
Tracking) ) ; 

zBool = abs (thisNumZ -numZASub) < param. zThreshold; 

indHitA '= indSubset ( find (muBool & respBool & zBool) ) ; 



if length ( indHitA) ==1 

indHitAl = peptide { j j , 6 } ; 

indHitAl = [indHitAl; [indHitA, ii] ] ; 

peptide ( j j , 6) = {indHitAl} ; 

end 

end 

end 
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[peptide] = track3DPassA01 (peptide ,param) ; 



% 
% 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
% Storage map 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 



% peptide{ii, 1} 

% peptidefii, 2} 

% peptide{ii, 3} 

% peptide{ii, 4} 

% peptide{ii, 5} 

% peptidefii, 6} 

% peptidefii, 7} 

% peptide{ ii , 8} 

% peptide{ii, 9} 

% peptide{ii, 10} 



mwHPlusPep; 
retTimePep, 
responsePep, 
numZPep, 
idCluster ; 
indHit ; 
rtTable; 
rtPosDelta; 
rtNegDelta} ; 



After pass 
After pass 
After pass 



= retTimePepTarget ; % ret time in target 



% 

% param. retTimeOf f setTracking = dinput (' Retention time offset for tracking pass (min)',5 
.0) ; 



% param. heightRatioTracking 

% param. SNRLimit Tracking 

% param. mzOf f set 

% param. stopForAllPlots 

% param. zThreshold = 0.5; 



dinput (' Height ratio range for tracking pass 
dinput (' SNR Limit for tracking pass', 10); 
dinput ( 'Maximum m/z offset (amu) ' , 0 . 02 ) ; 
dinput ('Stop for all plots?' ,0); 



,0.4) 



function peptide = track3DLoop01 (peptide, param) ; 
disp( [ 'Execute: ' ,mfilename] ) 



mwHPlusTarget 
retTimeTarget 
responseTarget 
numZTarget 



peptide {param. iiTarget , 1} ; 
peptide{param. iiTarget, 2} ; 
peptide {param. iiTarget , 3 } ; 
peptide {param. iiTarget, 4} ; 



for ii = 1: length (responseTarget) 

if rem(ii,500)==0 
disp(ii) 

end 



if responseTarget (ii) 
continue; 

end 

thisRetTime 
thisMwHPlus 
thisResponse 
thisNumZ 



param. SNRLimitTracking 



retTimeTarget (ii) ; 
mwHPlusTarget (ii) ; 
responseTarget ( ii ) ; 
numZTarget (ii) ; 



for j j = 1 : param. numlnj ections 
if jj == param. iiTarget 
continue 



end 

mwHPlusA 
retTimeA 
responseA 
numZA 



peptide{ j j , 1} 
peptide{j j,2} 
peptide{jj,3} 
peptide! jj , 4 } 



indSubsei = find (abs ( retTimeA- chisRecTime) <param . retTimeOf r setTracking ) 
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% [peptide] = track3DPassB01 (peptide , param) ; 
% 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
% Storage map 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 
% 
% 
% 
% 
% 
% 
% 
% 
% 
% 
% 
% 
% 
.( 
% 
% 
% 
% 
% 



peptide{ii, 1} 
peptide{ii, 2} 
peptide{ii, 3} 
peptide{ii,4} 
peptide{ii, 5} 
peptide{ii, 6} 
peptidefii, 7} 
peptide{ii , 8} 
peptide{ii, 9} 
peptide{ii, 10} 
peptide{ii, 11} 



mwHPlusPep; 

retTimePep, 

responsePep, 

numZPep, 

idCluster ; 

indHit; 

rtTable; 

rtPosDelta; 

rtNegDelta} ; 



% After pass 1 
% After pass 1 
% After pass 1 



= retTimePepTarget ; % ret time in target 



= indHit; 



% defined in PassB 



param. retTimeOf f setTracking = dinput {' Retention time offset for tracking pass (min) ' , 5 _ 



.0) 



param. heightRatioTracking 
param. SNRLimitTracking 
param. mzOf f set 
param. stopForAllPlots 
param. zThreshold = 0.5; 



dinput ( 'Height ratio range for tracking pass 
dinput (*SNR Limit for tracking pass 1 , 10 ); 
dinput ( 'Maximum m/z offset (amu) ' , 0 . 02) ; 
dinput ('Stop for all plots?' , 0); 



' ,0.4) 



% Definition 

% param. sigmaRetTimeResid ( ii ) = median (abs ( retTimeResidNoZero) ) /0.67; 



function peptide = track3DPassB01 (peptide , param) ; 
disp( [ 'Execute: ' ^filename] ) 



mwHPlusTarget = peptide {param. iiTarget , 1 } 

retTimeTarget = peptide {param. iiTarget , 2 } 

responseTarget = peptide {param. iiTarget , 3 ) 

numZTarget = peptide {param. iiTarget , 4 } 

for ii = 1 : length (responseTarget ) 

if rem(ii, 500) ==0 
disp(ii) 

end 



if responseTarget (ii) < param. SNRLimitID 
continue; 

end 

if numZTarget (ii) < param. zLowerLimit | numZTarget ( ii ) > param. zUpperLimit 
continue; 

end 

thisRetTime = retTimeTarget ( ii ) ; 
thisMwHPlus = mwHPlusTarget (ii) ; 
thisResponse = responseTarget ( i i ) ; 

thisNum! - numZTarper Hi): 
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ABSTRACT 



Ion Detection in Three Dimensions: A Novel Algorithm to Detect and Quantify Ions 
Obtained from High-Accuracy LC/MS Separations of Tryptic Digests of Complex 

Protein Mixtures 



Introduction: The potential of LC/MS separations of complex mixtures is fully realized 
only when all the ions detected by the mass spectrometer are recovered in the analysis of 
the data. Once detected, the ions can be used for quantitative and qualitative purposes. 
Ions from isotopes of peptides, for example, can assembled into clusters and their mono- 
isotope mass can be accurately determined. 

Thus the deceptively simple problem of ion detection is in fact, a potential limiting step 
in the exploitation of LC/MS data. For example, a peak-detection algorithm originally 
designed to detect peaks in a spectrum may be adopted to address the problem of ion 
detection in three-dimensional LC/MS separations, resulting in less than optimal 
performance. Here, we introduce a novel three-dimensional ion detection algorithm 
optimized for the analysis of high-mass-accuracy LC/MS data. 

Methods: The method assembles the spectra obtained from the LC/MS separation into a 
matrix. The columns of the matrix are the spectra, the rows are the chromatograms. A 
novel convolution method, based on the properties of matched filters is applied to this 
matrix. The properties of the filter are designed to identify all the potentially detectable 
ions present in the data. Thus the approach lends itself to resolution enhancement: Pairs 
of ions that are only partially resolved or appear as shoulders, can separately quantified. 
In addition, low-intensity ions that might otherwise be overlooked can be detected; thus 
the method lends itself to the analysis of samples whose intensities span the full dynamic 
range of the instrument. 

Results: The samples used to evaluate this new algorithm were obtained from tryptic 
digests of proteins test mixtures and from serum spiked with the test mixtures. We 
obtained data from these digests using a high-resolution (> 17,000) orthogonal quadrupole 
time of flight mass spectrometers. The ions resulting from different isotopic states are 
separately detected and high accuracy mass, retention time, and intensity values for each 
ion are obtained. These cluster-associated ions are assembled into clusters producing 
unique value for mwHPlus for each cluster. 

We demonstrate the quantitative reproducibility of molecular weight, retention time, and 
intensity of the data over the large dynamic range of this data as obtained using this 
algorithm. 
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Statistical study of LC/MS/MS data of human serum 



This work provides a statistical study of LC/MS/MS data of human serum. The statistical 
study is very important for understanding the experimental data of complex biological 
mixtures. The digested peptides from sample (human serum) are run by LC/MS/MS. 
The raw data from LC/MS/MS are processed to generated ion sticks. Each ion stick has 
three parameters: m/z, retention time and intensity. One dimensional histograms of m/z, 
retention time and intensity for both MS data and MS/MS data are studied. Two 
dimensional histogram of m/z, retention time for both MS and MS/MS data are also 
studied. By those studies we can find what are most frequent m/z, retention time and 
intensity. Then the ion sticks are deconvoluted into peptide lists by both charge and 
isotopic deconvolution for both MS data and MS/MS data. Each peptide is come from 
multiple isotopes and multiple charges. Each peptide has three parameters: peptide mhp, 
peptide retention time and peptide intensity. The histograms of peptide mhp, peptide 
retention time and peptide intensity are studied. The most frequent peptide mhp, 
peptide retention time and peptide intensity are found. Two dimensional histogram of 
peptide's mhp, retention time for both MS and MS/MS data are also studied for different 
number of bins of mhp and retention time. This indicates how many peptides can be 
found in a certain mhp and retention time window for the complex biological mixtures. 

Next step is to study the replication of injections. Sample (Human serum) is run by 
LC/MS/MS for replicated three times. The statistical calculations (mean, median, stand 
deviation and coefficient of variation) for number of peptides and total peptide intensities 
from 3 injections of the sample are provided. The coefficient of variation of number of 
peptides and total peptide intensities between 3 injections of the sample is about 5%. 
This demonstrates the reproducibility of total number of peptides and total peptide 
intensities. In summary we have studied the statistics of LC/MS/MS data of human 
serum, which is very useful for understating the experimental data of LC/MS/MS of 
complex biological mixtures. 
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Statistical study of LC/MS data of human serum spiked with 
five proteins 



This work provides a statistical study of LC/MS data of human serum spiked with five 
proteins. The statistical study is an important step for quantitatively compare the relative 
level of proteins contained in two or more complex biological mixtures. Two samples 
are used for this study: sample 1 has human serum spiked with 5 pmole five proteins, 
sample 2 has human serum spiked with 1 pmole five proteins. The digested peptides 
from samples are run by LC/MS. Each sample has 3 replicated LC/MS runs. The raw 
data from LC/MS are processed to generated ion sticks. Each ion stick has three 
parameters: m/z, retention time and intensity. Then the ion sticks are deconvoluted into 
peptide sticks by both charge and isotopic deconvolution. Each peptide is come from 
multiple isotopes and multiple charges. Each peptide has three parameters: peptide mhp, 
peptide retention time and peptide intensity. The statistical calculations (mean, median, 
stand deviation and coefficient of variation) for number of peptides and total peptide 
intensities from 3 injections of each sample are provided. This demonstrates the 
reproducibility of total number of peptides and total peptide intensities. 

Next step is to study the replication of each peptide from 3 injections of each sample. 
Number of replicated peptides and replicated intensities are studied. About 60% of 
peptides and about 90% of peptides intensities are replicated. This indicates the non- 
replicated peptides are small intensity one. The average of coefficient of variation of 
replicated intensities is about 20%. Then the replications of each peptide from 6 
injections of two samples are studied. For all the peptides which are replicated for 6 
times, the statistical calculations (mean, median, stand deviation and coefficient of 
variation) of peptide intensities from 3 injections of each sample are provided. The 
mean intensities of replicated peptides between two samples are compared to indicate the 
relative level change of spiked proteins contained in two samples. The ratio of mean 
intensities of replicated peptides between two samples is plot against mean coefficient of 
variation of intensities of two samples. In summary we have done statistical study of 
LC/MS data, which is very useful for quantitatively compare of the relative level of 
proteins contained in two or more complex biological mixtures. 
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Statistical study of LC/MS/MS data of human serum 

This work provides a statistical study of LC/MS/MS data of human serum. The statistical 
study is very important for understanding the experimental data of complex biological mixtures. 
We study two cases: case 1 for one injection of one sample, case 2 for three or more injections of 
one or two samples. 

Case 1 studies one injection of one sample: This study provides the statistics of ions 
and peptides of LC/MS/MS data of human serum. The digested peptides from human serum are 
run by LC/MS/MS for one time. The raw data from LC/MS/MS are processed to generated ion 
sticks. Each ion stick has three parameters: m/z, retention time and intensity. One dimensional 
histograms of m/z, retention time and intensity for both MS data and MS/MS data are studied. 
Two dimensional histogram of m/z, retention time for both MS and MS/MS data are also studied. 
By those studies we can find what are most frequent m/z, retention time and intensity. Then the 
ion sticks are deconvoluted into peptide lists by both charge and isotopic deconvolution for both 
MS data and MS/MS data. Each peptide is come from multiple isotopes and multiple charges. 
Each peptide has three parameters: peptide mhp (peptide mass plus proton mass), peptide 
retention time and peptide intensity. The histograms of peptide mhp, peptide retention time and 
peptide intensity are studied. The most frequent peptide mhp, peptide retention time and peptide 
intensity are found. Two dimensional histogram of peptide's mhp, retention time for both MS 
and MS/MS data are also studied for different number of bins of mhp and retention time. This 
indicates how many peptides can be found in a certain mhp and retention time window for the 
complex biological mixtures. 

Case 2 studies three or more injections of one or two samples: This study provides the 
statistics of LC/MS replicated data of one or two samples. The statistical calculations (mean, 
median, stand deviation and coefficient of variation) for number of peptides and total peptide 
intensities from 3 injections of the sample (human serum) are provided. About 60% of peptides 
and about 90% of peptides intensities are replicated. For all the replicated peptides, histograms of 
mhp difference, retention time difference and intensity difference of replicated peptide's pair are 
studied. The statistical calculations (mean, stand deviation and coefficient of variation) for mhp 
and intensity of replicated peptides are also provided. Histograms of mean intensity of replicated 
peptides and non-replicated peptides are also studied. Similar statistical study for six injections 
of two samples (sample 1 has human serum spiked with 5 pmole five proteins, sample 2 has 
human serum spiked with 1 pmole five proteins) is also provided. The mean intensities of 
replicated peptides between two samples are compared to indicate the relative level change of 
spiked proteins contained in two samples. 
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Towards Quantitative Global Proteomics: Statistical Results Obtained from 
Multiple Tryptic Digests of Complex Protein Mixtures Using Novel Algorithms for 
the Detection, Tracking and Quantitation of Peptides 



Introduction: Quantitation of proteins by high-mass accuracy LC/MS separations 
requires reproducible sample preparation, robust separation methods, and accurate mass 
measurements. With such high quality such data in hand, our attention must turn to the 
algorithms needed to extract information from this data. One critical algorithmic step is 
reliable tracking of molecular entities between samples. . 

A molecular entity detected in one injection could be located (i.e, tracked) in another 
injection by comparing only mass values. However, in the case of complex mixtures, 
such as tryptic digests, a retention-time search-window of a few minutes may contain 
pairs of entities that have the same measured mass, but in fact are unrelated. The 
resulting mistakes in tracking will compromise quantitation. 

Methods: The novel algorithmic method introduced here addresses the problem of 
tracking. The method relies on accurate mass measurements to find the subset of entities 
that can be uniquely tracked by accurate mass alone. These unique matched pairs 
determine a retention time map, and such a map is found for all injections in a sample set. 

These maps are then used to assign a unique reference retention time to all molecular 
entities in all injections. The method used the unique paired masses as, in effect, internal 
standards to correct for the retention time offset of all entities. The reference retention 
times of an entity can then be compared between any two samples in the sample set. 

Results: The reference retention time puts all samples on an equal footing. The search 
window associated with the reference retention time can be as low as +/-0.2 minutes, 
much smaller than conventional minutes wide search windows. The reference retention 
time together with accurate mass can then be used to track an entity from injection to 
injection in a sample set. 

Tryptic digests that contain upwards of 10,000 unique masses whose nearly 100,000 ions 
can be detected in a 2-hour LC separation followed by online MS detection. 
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4) TITLE: 

Protocols to Assure Reproducible Quantitative and Qualitative Analysis of 
Tryptic Digests of Complex Protein Mixtures for Global Proteomic 

Experiments 



Introduction: Meaningful results in qualitative and quantitative proteomics, such as observation 
of differing expression levels of a protein in a series of samples, can only be obtained if samples 
are consistently prepared and analyzed. Tryptic digestion must be carried to completion for all 
proteins in order to maximize sequence coverage for identification and to all meaningful 
quantitative sample-to-sample comparison of a given peptide. Chromatographic separation of the 
resulting mixtures must also be performed in a consistent manner. 

We have developed protocols for tryptic digestion of protein mixtures designed to assure 
reproducible peptide production, protocols to assure maximum reproducibility of capillary scale 
HPLC, and software tools to easily verify the reproducibility of our experiments. 

Methods: A series of replicate digests of commercial rat serum was prepared. A proprietary 
detergent (RapiGest™ SF, Waters Corporation) was used as a denaturating agent. One or more 
standardized tryptic digests of individual proteins (MassPrep™ Digestion Standards, Waters 
Corporation) were added to the digests. Samples were analyzed by direction onto a 300 micron 
diameter x 15 cm column packed with Atlantis™ dC 18 packing and eluted with a 
water/acetonitrile/formic acid gradient. The column effluent was directed a Nano Lockspray 
source on a hybrid quadrupole-time of flight mass spectrometer (Q-ToF Ultima API, Waters 
Corporation) Mass spectral data was obtained alternating scans of low and high collision cell 
energy. Every 10 seconds a separate reference sample spectrum was obtained. 

Results: Use of the detergent as a denaturating agent was found not to interfere with 
chromatography or ionization of the tryptic peptides, nor was there any observable fouling of the 
ion source. 

Sample consistency was demonstrated as follows: Raw mass spectral data was processed by 
Protein Lynx Global Server (Waters Corporation) to compile a list of data points as pairs of 
retention times and accurate mass values (observed m/z values at that moment corrected by use 
of the reference mass channel, accurate to 10 ppm or less). The resulting data are compared by 
submission to a software tool (Track 3D, Waters Corporation, patent pending) which correlates 
retention time, accurate mass values, and signal intensities of two or more samples. Results of 
this correlation show that signals for a given mass are observed at similar retention time from 
sample to sample for a great plurality of the observed signals as demonstrated on a graphical 
representation of difference in retention time vs. retention time for any pair of data sets. 
Furthermore, we observe that data that replicates in such a fashion represents a very high 
percentage of the total ion signal intensity for all the data in question, thus demonstrating 
reproducibility from sample to sample. 

Fuller details of our protocols will be included in the poster. 
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